`DLPack` to `mdspan` by fbusato · Pull Request #7047 · NVIDIA/cccl

fbusato · 2025-12-23T01:43:27Z

Description

The PR implements conversion utilities that take a DLTensor view and produce a (host/device/managed) mdspan of the same underlying memory.

The opposite conversion is implemented in mdspan to DLPack #7027. #7027 is also a prerequisite of this PR.

Todo:

documentation

Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

…n-to-dlpack

…a-cuda Linker to link LTO (NVIDIA#7011) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>

This allows us to use it independently

…VIDIA#7026)

…NVIDIA#7024) Co-authored-by: pciolkosz <pciolkosz@nvidia.com>

* Rework hierarchy levels * add missing launches to native cluster level queries * remove dependency on runtime storage --------- Co-authored-by: pciolkosz <pciolkosz@nvidia.com>

…A#7019)

) * Fix synchronous resource adapter property passing * Hide pinned pool on older CUDA versions * Workaround MSVC bug * Missing maybe_unused

* Remove _view from the shared memory getter * Forgot about cudax

* Ignore CUDA free errors in thrust memory resource * Add a comment

@davebayer

* Don't set current device in CUDA 13 and handle extended lambda * Add extended lambda test * Compiler workarounds * Waive extended lambda test on NVRTC * Apply suggestion from @davebayer --------- Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

…regardless of exception support (NVIDIA#7028) Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

…DIA#7012)

mhoemmen · 2026-01-08T23:14:33Z

@leofang wrote:

What does a custom layout mapping entail, exactly?

A "custom layout mapping" is a user-defined type that meets the layout mapping requirements.

This file in the reference mdspan implementation's tests has examples.

It should be possible for us to write a custom layout mapping that supports arbitrary DLPack layouts. It would need to store an offset as well as strides, so that negative strides would still result in a nonnegative mapping result.

Is submdspan a must-have for DLPack / Python users? I get that it is a thing that C++ users need (due to the standardization), but I can't see how we'd use it in Python. Assuming we are allowed to not worry about submdspan, is there any other reason that requires positive strides?

layout_stride needs nonzero strides if all the extents are nonzero, because otherwise the mapping would not be unique. The C++ Standard specifies that layout_stride::mapping is always unique. This would have nothing to do with submdspan.
layout_stride needs nonnegative strides because otherwise evaluating the mapping could give a negative result. That would violate the layout mapping requirements. This would have nothing to do with submdspan.

The following paper explains these issues: https://isocpp.org/files/papers/P3959R0.html .

mhoemmen · 2026-01-09T17:46:31Z

Here is a rough draft of a custom layout mapping that would support DLPack's layout (including zero and negative strides): https://godbolt.org/z/WEYazcsxT . I've commented out submdspan_mapping so the type doesn't yet support submdspan, but it's still a legal layout mapping. Getting it to support submdspan wouldn't be too hard.

namespace std {
  // work around issue in single header reference implementation
  using ::std::experimental::dims; 

  namespace impl {
    // [mdspan.layout.stride.expo] 2 defines OFFSET(m).
    // It's only ever applied to strided layout mappings.
    template<class Mapping>
      requires(Mapping::is_always_strided())
    constexpr auto offset(const Mapping& m) {
      if constexpr (typename Mapping::extents_type::rank() == 0) {
        return m();
      }
      else {
        using index_type = typename Mapping::index_type;
        constexpr auto rank = typename Mapping::extents_type::rank();     
        bool any_zero = false;
        for (::std::size_t r = 0; r < rank; ++r) {
          any_zero = any_zero || (m.extents().extent(r) == 0);
        }
        if (any_zero) {
          return index_type(0);
        }
        else {
          constexpr auto zeros =
            []< ::std::size_t... Rs>(std::index_sequence<Rs...>) {
              return std::tuple{((void) Rs, index_type(0))...};
            } (std::make_index_sequence<rank>());
          return std::apply(m, zeros);
        }
      }
    }
  }

  class layout_stride_relaxed {
  public:
    template<class Extents>
    class mapping;
  };

  template<class Extents>
  class layout_stride_relaxed::mapping {
  public:
    using extents_type = Extents;
    using index_type = extents_type::index_type;
    using size_type = extents_type::size_type;
    using rank_type = extents_type::rank_type;
    using layout_type = layout_stride_relaxed;

  private:
    static constexpr rank_type rank_ = extents_type::rank();

  public:
    constexpr mapping() noexcept {
      const layout_right::mapping<extents_type> map{};
      for (std::size_t d = 0; d < rank_; ++d) {
        strides_[d] = map.stride(d);
      }
    }

    constexpr mapping(const mapping&) noexcept = default;

    template<class OtherIndexType>
    requires(
      ::std::is_convertible_v<const OtherIndexType&, ::std::intptr_t> &&
      ::std::is_nothrow_constructible_v<::std::intptr_t, const OtherIndexType&>
    )
    constexpr mapping(
        const extents_type& e,
        ::std::span<OtherIndexType, rank_> s,
        ::std::size_t offset = 0) noexcept
      : extents_(e), offset_(offset)
    {
      for (::std::size_t d = 0; d < rank_; ++d) {
        strides_[d] = s[d];
      }
    }

    template<class OtherIndexType>
    requires(
      is_convertible_v<const OtherIndexType&, ::std::intptr_t> &&
      is_nothrow_constructible_v<::std::intptr_t, const OtherIndexType&>
    )
    constexpr mapping(
        const extents_type& e,
        const ::std::array<OtherIndexType, rank_>& s,
        ::std::size_t offset = 0) noexcept
      : extents_(e), offset_(offset)
    {
      for (::std::size_t d = 0; d < rank_; ++d) {
        strides_[d] = s[d];
      }
    }

    // m IS a layout_stride_relaxed::mapping
    template<class OtherMapping>
    requires(
      detail::layout_mapping_alike<OtherMapping> &&
      ::std::is_constructible_v<
        extents_type,
        typename OtherMapping::extents_type
      > &&
      ::std::is_same_v<
        layout_type,
        typename OtherMapping::layout_type
      >
    )
    constexpr explicit(
      ! (
        ::std::is_convertible_v<
          typename OtherMapping::extents_type, extents_type
        >
      )
    )
    mapping(const OtherMapping& m) noexcept
      : extents_(m.extents()), offset_(m.offset_)
    {
      for (std::size_t d = 0; d < rank_; ++d) {
        strides_[d] = m.stride(d);
      }
    }

    // m is NOT a layout_stride_relaxed::mapping
    template<class StridedLayoutMapping>
    requires(
      detail::layout_mapping_alike<StridedLayoutMapping> &&
      ::std::is_constructible_v<
        extents_type,
        typename StridedLayoutMapping::extents_type> &&
      StridedLayoutMapping::is_always_unique() &&
      StridedLayoutMapping::is_always_strided()
    )
    constexpr explicit(
      ! (
        ::std::is_convertible_v<
          typename StridedLayoutMapping::extents_type, extents_type
        > && (
          detail::is_mapping_of<layout_left, StridedLayoutMapping> ||
          detail::is_mapping_of<layout_right, StridedLayoutMapping> ||
          experimental::detail::is_layout_left_padded_mapping<
            StridedLayoutMapping>::value ||
          experimental::detail::is_layout_right_padded_mapping<
            StridedLayoutMapping>::value ||
          detail::is_mapping_of<layout_stride, StridedLayoutMapping>
        )
      )
    )
    mapping(const StridedLayoutMapping& m) noexcept
      : extents_(m.extents())
    {
      for (std::size_t d = 0; d < rank_; ++d) {
        strides_[d] = m.stride(d);
      }
    }
 
    constexpr mapping& operator=(const mapping&) noexcept = default;

    // [mdspan.layout.stride.obs], observers
    constexpr const extents_type& extents() const noexcept {
      return extents_;
    }
    constexpr ::std::array<index_type, rank_> strides() const noexcept {
      return strides_;
    }
    constexpr ::std::intptr_t offset() const noexcept {
      return offset_;
    }

    constexpr index_type required_span_size() const noexcept {
      // The dot product of indices and strides is linear.
      // Thus, over all valid indices, the max value of the
      // dot product is achieved at the extrema: either the
      // min index (0) if the stride is negative, or the max
      // index (extent(r) - 1) if the stride is nonnegative.
      std::array<index_type, rank_> max_indices{};
      for (std::size_t r = 0; r < rank_; ++r) {
        const index_type ext = extents_.extent(r);
        const index_type ext_minus_1 =
          ext == 0 ? index_type(0) : ext - index_type(1);
        max_indices[r] = strides_[r] < 0 ? index_type(0) : ext_minus_1;
      }
      index_type dot = 0;
      for (std::size_t r = 0; r < rank_; ++r) {
        dot += max_indices[r] * strides_[r];
      }
      return offset() + dot;
    }

    template<class... Indices>
    requires(
      sizeof...(Indices) == rank_ &&
      (::std::is_convertible_v<Indices, index_type> && ...) &&
      (::std::is_nothrow_constructible_v<index_type, Indices> && ...)
    )
    constexpr index_type operator()(Indices... inds) const noexcept {
      return offset() +
        [&, this]<::std::size_t... Rs>(::std::index_sequence<Rs...>) {
          return ((inds...[Rs] * strides_[Rs]) + ... + index_type(0));
        } (::std::make_index_sequence<rank_>());
    }

    static constexpr bool is_always_unique() noexcept { return false; }
    static constexpr bool is_always_exhaustive() noexcept { return false; }
    // It's technically NOT always strided, because of the offset
    // (to accommodate negative strides)
    static constexpr bool is_always_strided() noexcept { return false; }

    constexpr bool is_unique() noexcept {
      // The Standard doesn't require that this be exact.
      // Possibility of negative strides with an offset
      // makes that harder to figure out.
      return false;  
    }
    constexpr bool is_exhaustive() const noexcept {
      // The Standard doesn't require that this be exact.
      // Possibility of negative strides with an offset
      // makes that harder to figure out.
      return false;  
    }
    constexpr bool is_strided() noexcept {
      return offset_ == 0;
    }

    constexpr index_type stride(rank_type i) const noexcept {
      return strides_[i];
    }

    // y is also a layout_stride_relaxed::mapping
    template<class OtherMapping>
    requires(
      detail::layout_mapping_alike<OtherMapping> &&
      rank_ == OtherMapping::extents_type::rank() &&
      ::std::is_same_v<layout_type, typename OtherMapping::layout_type>
    )
    friend constexpr bool
    operator==(const mapping& x, const OtherMapping& y) noexcept {
      return x.extents() == y.extents() &&
      x.offset_ == y.offset_ &&
      [&]<::std::size_t...Rs> (::std::index_sequence<Rs...>) {
        return ((x.stride(Rs) == y.stride(Rs)) && ...);
      } (::std::make_index_sequence<rank_>());
    }

    // y is NOT a layout_stride_relaxed::mapping but is strided.
    template<class OtherMapping>
    requires(
      detail::layout_mapping_alike<OtherMapping> &&
      rank_ == OtherMapping::extents_type::rank() &&
      OtherMapping::is_always_strided()
    )
    friend constexpr bool
    operator==(const mapping& x, const OtherMapping& y) noexcept {
      return x.extents() == y.extents() &&
      impl::offset(y) == x.offset_ &&
      [&]<::std::size_t...Rs> (::std::index_sequence<Rs...>) {
        return ((x.stride(Rs) == y.stride(Rs)) && ...);
      } (::std::make_index_sequence<rank_>());
    }

  private:
    extents_type extents_{};
    std::intptr_t offset_ = 0;
    array<std::intptr_t, rank_> strides_{};

#if 0
    // [mdspan.sub.map], submdspan mapping specialization
    template<class... SliceSpecifiers>
      constexpr auto submdspan-mapping-impl(SliceSpecifiers...) const
        -> /* see-below */;

    template<class... SliceSpecifiers>
      friend constexpr auto submdspan_mapping(
        const mapping& src, SliceSpecifiers... slices) {
          return src.submdspan-mapping-impl(slices...);
      }
#endif // 0
  };
} // namespace std

int main() {
  std::dims<3> exts(3, 5, 11);  
  std::array<std::intptr_t, 3> strides{0, 1, 5}; // broadcasting
  std::layout_stride_relaxed::mapping<std::dims<3>> map(exts, strides);

  assert(map(0, 1, 1) == map(1, 1, 1));

  return 0;
}

docs/libcudacxx/extended_api/mdspan/dlpack_to_mdspan.rst

libcudacxx/include/cuda/__internal/dlpack.h

libcudacxx/include/cuda/__mdspan/dlpack_to_mdspan.h

Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

libcudacxx/include/cuda/__mdspan/dlpack_to_mdspan.h

github-actions · 2026-01-26T21:01:38Z

🥳 CI Workflow Results

🟩 Finished in 1h 41m: Pass: 100%/84 | Total: 1d 03h | Max: 1h 40m | Hits: 97%/199140

See results here.

fbusato and others added 30 commits December 18, 2025 12:16

first version

750ca5a

add unit test

f040c10

documentation

464ccc2

Update libcudacxx/include/cuda/__mdspan/mdspan_to_dlpack.h

6f32ae9

Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

Merge branch 'mdspan-to-dlpack' of github.com:fbusato/cccl into mdspa…

3457d3a

…n-to-dlpack

add many types

ee05eda

remove operator->

4d2e0da

formatting

f290320

fix MSVC warning

7a22848

improve documentation

f78db30

fix MSVC warning

1467ab2

first version

d844f65

complete the implementation

3843556

add unit test

977909f

cuda.coop: Use cuda.core.experimental.Linker instead of internal numb…

b0e1fbc

…a-cuda Linker to link LTO (NVIDIA#7011) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>

Make c2h vector comparisons constexpr (NVIDIA#7009)

50da3d4

improves comments on decoupled lookback example (NVIDIA#7015)

f8a4d06

Extract reduce_op_sync into a free function (NVIDIA#7004)

e9f0a13

This allows us to use it independently

Remove experimental namespace from cuda.core import (NVIDIA#7022)

362d316

reexpress completion signature transform alias to make clangd happy (N…

28d22c9

…VIDIA#7026)

Qualify call to __launch_impl in launch.h to avoid ambiguity errors (…

1e28e8c

…NVIDIA#7024) Co-authored-by: pciolkosz <pciolkosz@nvidia.com>

Rework hierarchy levels (NVIDIA#6957)

f21a158

* Rework hierarchy levels * add missing launches to native cluster level queries * remove dependency on runtime storage --------- Co-authored-by: pciolkosz <pciolkosz@nvidia.com>

Use vectorized tuning for triad benchmark for dtypes of size 2 (NVIDI…

1ef85d4

…A#7019)

[libcu++] Fix synchronous resource adapter property passing (NVIDIA#6976

00a1b95

) * Fix synchronous resource adapter property passing * Hide pinned pool on older CUDA versions * Workaround MSVC bug * Missing maybe_unused

[libcu++] Remove _view from the shared memory getter name (NVIDIA#6997)

adc23f5

* Remove _view from the shared memory getter * Forgot about cudax

[thrust] Ignore CUDA free errors in thrust memory resource (NVIDIA#7002)

33aa542

* Ignore CUDA free errors in thrust memory resource * Add a comment

the <stdexcept> header must be included when using _CCCL_THROW, …

6402bc6

…regardless of exception support (NVIDIA#7028) Co-authored-by: David Bayer <48736217+davebayer@users.noreply.github.com>

Error out when nvrtcc cannot parse cuda_thread_count (NVIDIA#7035)

5546b87

Allow all public headers to be included with host compilers only (NVI…

58aba1d

…DIA#7012)

fbusato added 2 commits January 7, 2026 09:40

Merge branch 'mdspan-to-dlpack' into dlpack-to-mdspan

1c7f5d4

address comments

9bbf73b

fbusato added a commit to fbusato/cccl that referenced this pull request Jan 7, 2026

address comments from NVIDIA#7047

87b6777